Chapter 6. Memory

Tyrell: If we give them a past, we create a cushion for their emotions and, consequently, we can control them better.

Deckard: Memories. You're talking about memories.

—the movie Blade Runner

In this chapter, you will learn everything you need to know about memory in embedded systems. In particular, you will learn about the types of memory you are likely to encounter, how to test memory devices to see if they are working properly, and how to use Flash memory.

6.1 Types of Memory

Many types of memory devices are available for use in modern computer systems. As an embedded software engineer, you must be aware of the differences between them and understand how to use each type effectively. In our discussion, we will approach these devices from a software viewpoint. As you are reading, try to keep in mind that the development of these devices took several decades and that there are significant physical differences in the underlying hardware. The names of the memory types frequently reflect the historical nature of the development process and are often more confusing than insightful.

Most software developers think of memory as being either random-access (RAM) or read-only (ROM). But, in fact, there are subtypes of each and even a third class of hybrid memories. In a RAM device, the data stored at each memory location can be read or written, as desired. In a ROM device, the data stored at each memory location can be read at will, but never written. In some cases, it is possible to overwrite the data in a ROM-like device. Such devices are called hybrid memories because they exhibit some of the characteristics of both RAM and ROM. Figure 6-1 provides a classification system for the memory devices that are commonly found in embedded systems.

Figure 6-1. Common memory types in embedded systems

figs/ESP_0601.gif

6.1.1 Types of RAM

There are two important memory devices in the RAM family: SRAM and DRAM. The main difference between them is the lifetime of the data stored. SRAM (static RAM) retains its contents as long as electrical power is applied to the chip. However, if the power is turned off or lost temporarily then its contents will be lost forever. DRAM (dynamic RAM), on the other hand, has an extremely short data lifetime—usually less than a quarter of a second. This is true even when power is applied constantly.

In short, SRAM has all the properties of the memory you think of when you hear the word RAM. Compared to that, DRAM sounds kind of useless. What good is a memory device that retains its contents for only a fraction of a second? By itself, such a volatile memory is indeed worthless. However, a simple piece of hardware called a DRAM controller can be used to make DRAM behave more like SRAM. (See DRAM Controllers later in this chapter.) The job of the DRAM controller is to periodically refresh the data stored in the DRAM. By refreshing the data several times a second, the DRAM controller keeps the contents of memory alive for as long as they are needed. So, DRAM is as useful as SRAM after all.

DRAM Controllers

If your embedded system includes DRAM, there is probably a DRAM controller on board (or on-chip) as well. The DRAM controller is an extra piece of hardware placed between the processor and the memory chips. Its main purpose is to perform the refresh operations required to keep your data alive in the DRAM. However, it cannot do this properly without some help from you.

One of the first things your software must do is initialize the DRAM controller. If you do not have any other RAM in the system, you must do this before creating the stack or heap. As a result, this initialization code is usually written in assembly language and placed within the hardware initialization module.

Almost all DRAM controllers require a short initialization sequence that consists of one or more setup commands. The setup commands tell the controller about the hardware interface to the DRAM and how frequently the data there must be refreshed. To determine the initialization sequence for your particular system, consult the designer of the board or read the databooks that describe the DRAM and DRAM controller. If the DRAM in your system does not appear to be working properly, it could be that the DRAM controller either is not initialized or has been initialized incorrectly.

When deciding which type of RAM to use, a system designer must consider access time and cost. SRAM devices offer extremely fast access times (approximately four times faster than DRAM) but are much more expensive to produce. Generally, SRAM is used only where access speed is extremely important. A lower cost per byte makes DRAM attractive whenever large amounts of RAM are required. Many embedded systems include both types: a small block of SRAM (a few hundred kilobytes) along a critical data path and a much larger block of DRAM (in the megabytes) for everything else.

6.1.2 Types of ROM

Memories in the ROM family are distinguished by the methods used to write new data to them (usually called programming) and the number of times they can be rewritten. This classification reflects the evolution of ROM devices from hardwired to one-time programmable to erasable-and-programmable. A common feature across all these devices is their ability to retain data and programs forever, even during a power failure.

The very first ROMs were hardwired devices that contained a preprogrammed set of data or instructions. The contents of the ROM had to be specified before chip production, so the actual data could be used to arrange the transistors inside the chip! Hardwired memories are still used, though they are now called "masked ROMs" to distinguish them from other types of ROM. The main advantage of a masked ROM is a low production cost. Unfortunately, the cost is low only when hundreds of thousands of copies of the same ROM are required.

One step up from the masked ROM is the PROM (programmable ROM), which is purchased in an unprogrammed state. If you were to look at the contents of an unprogrammed PROM, you would see that the data is made up entirely of 1's. The process of writing your data to the PROM involves a special piece of equipment called a device programmer. The device programmer writes data to the device one word at a time, by applying an electrical charge to the input pins of the chip. Once a PROM has been programmed in this way, its contents can never be changed. If the code or data stored in the PROM must be changed, the current device must be discarded. As a result, PROMs are also known as one-time programmable (OTP) devices.

An EPROM (erasable-and-programmable ROM) is programmed in exactly the same manner as a PROM. However, EPROMs can be erased and reprogrammed repeatedly. To erase an EPROM, you simply expose the device to a strong source of ultraviolet light. (There is a "window" in the top of the device to let the ultraviolet light reach the silicon.) By doing this, you essentially reset the entire chip to its initial—unprogrammed—state. Though more expensive than PROMs, their ability to be reprogrammed makes EPROMs an essential part of the software development and testing process.

6.1.3 Hybrid Types

As memory technology has matured in recent years, the line between RAM and ROM devices has blurred. There are now several types of memory that combine the best features of both. These devices do not belong to either group and can be collectively referred to as hybrid memory devices. Hybrid memories can be read and written as desired, like RAM, but maintain their contents without electrical power, just like ROM. Two of the hybrid devices, EEPROM and Flash, are descendants of ROM devices; the third, NVRAM, is a modified version of SRAM.

EEPROMs are electrically-erasable-and-programmable. Internally, they are similar to EPROMs, but the erase operation is accomplished electrically, rather than by exposure to ultraviolet light. Any byte within an EEPROM can be erased and rewritten. Once written, the new data will remain in the device forever—or at least until it is electrically erased. The tradeoff for this improved functionality is mainly higher cost. Write cycles are also significantly longer than writes to a RAM, so you wouldn't want to use an EEPROM for your main system memory.

Flash memory is the most recent advancement in memory technology. It combines all the best features of the memory devices described thus far. Flash memory devices are high density, low cost, nonvolatile, fast (to read, but not to write), and electrically reprogrammable. These advantages are overwhelming and the use of Flash memory has increased dramatically in embedded systems as a direct result. From a software viewpoint, Flash and EEPROM technologies are very similar. The major difference is that Flash devices can be erased only one sector at a time, not byte by byte. Typical sector sizes are in the range of 256 bytes to 16 kilobytes. Despite this disadvantage, Flash is much more popular than EEPROM and is rapidly displacing many of the ROM devices as well.

The third member of the hybrid memory class is NVRAM (nonvolatile RAM). Nonvolatility is also a characteristic of the ROM and hybrid memories discussed earlier. However, an NVRAM is physically very different from those devices. An NVRAM is usually just an SRAM with a battery backup. When the power is turned on, the NVRAM operates just like any other SRAM. But when the power is turned off, the NVRAM draws just enough electrical power from the battery to retain its current contents. NVRAM is fairly common in embedded systems. However, it is very expensive—even more expensive than SRAM—so its applications are typically limited to the storage of only a few hundred bytes of system-critical information that cannot be stored in any better way.

Table 6-1 summarizes the characteristics of different memory types.

Table 6-1. Memory Device Characteristics

Memory Type

Volatile?

Writeable?

Erase Size

Erase Cycles

Relative Cost

Relative Speed

SRAM

yes

yes

byte

unlimited

expensive

fast

DRAM

yes

yes

byte

unlimited

moderate

moderate

Masked ROM

no

no

n/a

n/a

inexpensive

fast

PROM

no

once, with programmer

n/a

n/a

moderate

fast

EPROM

no

yes, with programmer

entire chip

limited (see specs)

moderate

fast

EEPROM

no

yes

byte

limited (see specs)

expensive

fast to read, slow to write

Flash

no

yes

sector

limited (see specs)

moderate

fast to read, slow to write

NVRAM

no

yes

byte

none

expensive

fast

6.2 Memory Testing

One of the first pieces of serious embedded software you are likely to write is a memory test. Once the prototype hardware is ready, the designer would like some reassurance that she has wired the address and data lines correctly and that the memory chips are working properly. At first this might seem like a fairly simple assignment, but as you look at the problem more closely you will realize that it can be difficult to detect subtle memory problems with a simple test. In fact, as a result of programmer naiveté, many embedded systems include memory tests that would detect only the most catastrophic memory failures. Some of these might not even notice that the memory chips have been removed from the board!

Direct Memory Access

Direct memory access (DMA) is a technique for transferring blocks of data directly between two hardware devices. In the absence of DMA, the processor must read the data from one device and write it to the other, one byte or word at a time. If the amount of data to be transferred is large, or the frequency of transfers is high, the rest of the software might never get a chance to run. However, if a DMA controller is present it is possible to have it perform the entire transfer, with little assistance from the processor.

Here's how DMA works. When a block of data needs to be transferred, the processor provides the DMA controller with the source and destination addresses and the total number of bytes. The DMA controller then transfers the data from the source to the destination automatically. After each byte is copied, each address is incremented and the number of bytes remaining is reduced by one. When the number of bytes remaining reaches zero, the block transfer ends and the DMA controller sends an interrupt to the processor.

In a typical DMA scenario, the block of data is transferred directly to or from memory. For example, a network controller might want to place an incoming network packet into memory as it arrives, but only notify the processor once the entire packet has been received. By using DMA, the processor can spend more time processing the data once it arrives and less time transferring it between devices. The processor and DMA controller must share the address and data buses during this time, but this is handled automatically by the hardware and the processor is otherwise uninvolved with the actual transfer.

The purpose of a memory test is to confirm that each storage location in a memory device is working. In other words, if you store the number 50 at a particular address, you expect to find that number stored there until another number is written. The basic idea behind any memory test, then, is to write some set of data values to each address in the memory device and verify the data by reading it back. If all the values read back are the same as those that were written, then the memory device is said to pass the test. As you will see, it is only through careful selection of the set of data values that you can be sure that a passing result is meaningful.

Of course, a memory test like the one just described is unavoidably destructive. In the process of testing the memory, you must overwrite its prior contents. Because it is generally impractical to overwrite the contents of nonvolatile memories, the tests described in this section are generally used only for RAM testing. However, if the contents of a hybrid memory are unimportant—as they are during the product development stage—these same algorithms can be used to test those devices as well. The problem of validating the contents of a nonvolatile memory is addressed in a later section of this chapter.

6.2.1 Common Memory Problems

Before learning about specific test algorithms, you should be familiar with the types of memory problems that are likely to occur. One common misconception among software engineers is that most memory problems occur within the chips themselves. Though a major issue at one time (a few decades ago), problems of this type are increasingly rare. The manufacturers of memory devices perform a variety of post-production tests on each batch of chips. If there is a problem with a particular batch, it is extremely unlikely that one of the bad chips will make its way into your system.

The one type of memory chip problem you could encounter is a catastrophic failure. This is usually caused by some sort of physical or electrical damage received by the chip after manufacture. Catastrophic failures are uncommon and usually affect large portions of the chip. Because a large area is affected, it is reasonable to assume that catastrophic failure will be detected by any decent test algorithm.

In my experience, a more common source of memory problems is the circuit board. Typical circuit board problems are:

These are the problems that a good memory test algorithm should be able to detect. Such a test should also be able to detect catastrophic memory failures without specifically looking for them. So let's discuss circuit board problems in more detail.

6.2.1.1 Electrical wiring problems

An electrical wiring problem could be caused by an error in design or production of the board or as the result of damage received after manufacture. Each of the wires that connect the memory device to the processor is one of three types: an address line, a data line, or a control line. The address and data lines are used to select the memory location and to transfer the data, respectively. The control lines tell the memory device whether the processor wants to read or write the location and precisely when the data will be transferred. Unfortunately, one or more of these wires could be improperly routed or damaged in such a way that it is either shorted (i.e., connected to another wire on the board) or open (not connected to anything). These problems are often caused by a bit of solder splash or a broken trace, respectively. Both cases are illustrated in Figure 6-2.

Figure 6-2. Possible wiring problems

figs/ESP_0602.gif

Problems with the electrical connections to the processor will cause the memory device to behave incorrectly. Data might be stored incorrectly, stored at the wrong address, or not stored at all. Each of these symptoms can be explained by wiring problems on the data, address, and control lines, respectively.

If the problem is with a data line, several data bits might appear to be "stuck together" (i.e., two or more bits always contain the same value, regardless of the data transmitted). Similarly, a data bit might be either "stuck high" (always 1) or "stuck low" (always 0). These problems can be detected by writing a sequence of data values designed to test that each data pin can be set to and 1, independently of all the others.

If an address line has a wiring problem, the contents of two memory locations might appear to overlap. In other words, data written to one address will instead overwrite the contents of another address. This happens because an address bit that is shorted or open will cause the memory device to see an address different from the one selected by the processor.

Another possibility is that one of the control lines is shorted or open. Although it is theoretically possible to develop specific tests for control line problems, it is not possible to describe a general test for them. The operation of many control signals is specific to the processor or memory architecture. Fortunately, if there is a problem with a control line, the memory will probably not work at all, and this will be detected by other memory tests. If you suspect a problem with a control line, it is best to seek the advice of the board's designer before constructing a specific test.

6.2.1.2 Missing memory chips

A missing memory chip is clearly a problem that should be detected. Unfortunately, because of the capacitive nature of unconnected electrical wires, some memory tests will not detect this problem. For example, suppose you decided to use the following test algorithm: write the value 1 to the first location in memory, verify the value by reading it back, write 2 to the second location, verify the value, write 3 to the third location, verify, etc. Because each read occurs immediately after the corresponding write, it is possible that the data read back represents nothing more than the voltage remaining on the data bus from the previous write. If the data is read back too quickly, it will appear that the data has been correctly stored in memory—even though there is no memory chip at the other end of the bus!

To detect a missing memory chip, the test must be altered. Instead of performing the verification read immediately after the corresponding write, it is desirable to perform several consecutive writes followed by the same number of consecutive reads. For example, write the value 1 to the first location, 2 to the second location, and 3 to the third location, then verify the data at the first location, the second location, etc. If the data values are unique (as they are in the test just described), the missing chip will be detected: the first value read back will correspond to the last value written (3), rather than the first (1).

6.2.1.3 Improperly inserted chips

If a memory chip is present but improperly inserted in its socket, the system will usually behave as though there is a wiring problem or a missing chip. In other words, some number of the pins on the memory chip will either not be connected to the socket at all or will be connected at the wrong place. These pins will be part of the data bus, address bus, or control wiring. So as long as you test for wiring problems and missing chips, any improperly inserted chips will be detected automatically.

Before going on, let's quickly review the types of memory problems we must be able to detect. Memory chips only rarely have internal errors, but if they do, they are probably catastrophic in nature and will be detected by any test. A more common source of problems is the circuit board, where a wiring problem can occur or a memory chip might be missing or improperly inserted. Other memory problems can occur, but the ones described here are the most common and also the simplest to test in a generic way.

6.2.2 Developing a Test Strategy

By carefully selecting your test data and the order in which the addresses are tested, it is possible to detect all of the memory problems described earlier. It is usually best to break your memory test into small, single-minded pieces. This helps to improve the efficiency of the overall test and the readability of the code. More specific tests can also provide more detailed information about the source of the problem, if one is detected.

I have found it is best to have three individual memory tests: a data bus test, an address bus test, and a device test. The first two test for electrical wiring problems and improperly inserted chips; the third is intended to detect missing chips and catastrophic failures. As an unintended consequence, the device test will also uncover problems with the control bus wiring, though it cannot provide useful information about the source of such a problem.

The order in which you execute these three tests is important. The proper order is: data bus test first, followed by the address bus test, and then the device test. That's because the address bus test assumes a working data bus, and the device test results are meaningless unless both the address and data buses are known to be good. If any of the tests fail, you should work with a hardware engineer to locate the source of the problem. By looking at the data value or address at which the test failed, she should be able to quickly isolate the problem on the circuit board.

6.2.2.1 Data bus test

The first thing we want to test is the data bus wiring. We need to confirm that any value placed on the data bus by the processor is correctly received by the memory device at the other end. The most obvious way to test that is to write all possible data values and verify that the memory device stores each one successfully. However, that is not the most efficient test available. A faster method is to test the bus one bit at a time. The data bus passes the test if each data bit can be set to and 1, independently of the other data bits.

A good way to test each bit independently is to perform the so-called "walking 1's test." Table 6-2 shows the data patterns used in an 8-bit version of this test. The name, walking 1's, comes from the fact that a single data bit is set to 1 and "walked" through the entire data word. The number of data values to test is the same as the width of the data bus. This reduces the number of test patterns from 2n to n, where n is the width of the data bus.

Table 6-2. Consecutive Data Values for the Walking 1's Test

00000001

00000010

00000100

00001000

00010000

00100000

01000000

10000000

Because we are testing only the data bus at this point, all of the data values can be written to the same address. Any address within the memory device will do. However, if the data bus splits as it makes its way to more than one memory chip, you will need to perform the data bus test at multiple addresses, one within each chip.

To perform the walking 1's test, simply write the first data value in the table, verify it by reading it back, write the second value, verify, etc. When you reach the end of the table, the test is complete. It is okay to do the read immediately after the corresponding write this time because we are not yet looking for missing chips. In fact, this test may provide meaningful results even if the memory chips are not installed!

The function memTestDataBus shows how to implement the walking 1's test in C. It assumes that the caller will select the test address, and tests the entire set of data values at that address. If the data bus is working properly, the function will return 0. Otherwise it will return the data value for which the test failed. The bit that is set in the returned value corresponds to the first faulty data line, if any.

typedef unsigned char datum;     /* Set the data bus width to 8 bits. */

/**********************************************************************
 *
 * Function:    memTestDataBus()
 *
 * Description: Test the data bus wiring in a memory region by
 *              performing a walking 1's test at a fixed address
 *              within that region.  The address (and hence the
 *              memory region) is selected by the caller.
 *
 * Notes:
 *
 * Returns:     0 if the test succeeds.
 *              A nonzero result is the first pattern that failed.
 *
 **********************************************************************/
datum
memTestDataBus(volatile datum * address)
{
    datum pattern;


    /*
     * Perform a walking 1's test at the given address.
     */
    for (pattern = 1; pattern != 0; pattern <<= 1)
    {
        /*
         * Write the test pattern.
         */
        *address = pattern;

        /*
         * Read it back (immediately is okay for this test).
         */
        if (*address != pattern)
        {
            return (pattern);
        }
    }

    return (0);

}   /* memTestDataBus() */
6.2.2.2 Address bus test

After confirming that the data bus works properly, you should next test the address bus. Remember that address bus problems lead to overlapping memory locations. There are many possible addresses that could overlap. However, it is not necessary to check every possible combination. You should instead follow the example of the previous data bus test and try to isolate each address bit during testing. You simply need to confirm that each of the address pins can be set to and 1 without affecting any of the others.

The smallest set of addresses that will cover all possible combinations is the set of "power-of-two" addresses. These addresses are analogous to the set of data values used in the walking 1's test. The corresponding memory locations are 00001h, 00002h, 00004h, 00008h, 00010h, 00020h, and so forth. In addition, address 00000h must also be tested. The possibility of overlapping locations makes the address bus test harder to implement. After writing to one of the addresses, you must check that none of the others has been overwritten.

It is important to note that not all of the address lines can be tested in this way. Part of the address—the leftmost bits—selects the memory chip itself. Another part—the rightmost bits—might not be significant if the data bus width is greater than 8 bits. These extra bits will remain constant throughout the test and reduce the number of test addresses. For example, if the processor has 20 address bits, as the 80188EB does, then it can address up to 1 megabyte of memory. If you want to test a 128-kilobyte block of memory, the three most significant address bits will remain constant.[1] In that case, only the 17 rightmost bits of the address bus can actually be tested.

To confirm that no two memory locations overlap, you should first write some initial data value at each power-of-two offset within the device. Then write a new value—an inverted copy of the initial value is a good choice—to the first test offset, and verify that the initial data value is still stored at every other power-of-two offset. If you find a location (other than the one just written) that contains the new data value, you have found a problem with the current address bit. If no overlapping is found, repeat the procedure for each of the remaining offsets.

The function memTestAddressBus shows how this can be done in practice. The function accepts two parameters. The first parameter is the base address of the memory block to be tested, and the second is its size, in bytes. The size is used to determine which address bits should be tested. For best results, the base address should contain a in each of those bits. If the address bus test fails, the address at which the first error was detected will be returned. Otherwise, the function returns NULL to indicate success.

/**********************************************************************

 * Function:    memTestAddressBus()
 *
 * Description: Test the address bus wiring in a memory region by
 *              performing a walking 1's test on the relevant bits
 *              of the address and checking for aliasing.  The test
 *              will find single-bit address failures such as stuck
 *              -high, stuck-low, and shorted pins.  The base address
 *              and size of the region are selected by the caller.
 *
 * Notes:       For best results, the selected base address should
 *              have enough LSB 0's to guarantee single address bit
 *              changes.  For example, to test a 64 KB region, select
 *              a base address on a 64 KB boundary.  Also, select the
 *              region size as a power-of-two--if at all possible.
 *
 * Returns:     NULL if the test succeeds.
 *              A nonzero result is the first address at which an
 *              aliasing problem was uncovered.  By examining the
 *              contents of memory, it may be possible to gather
 *              additional information about the problem.
 *
 **********************************************************************/
datum *
memTestAddressBus(volatile datum * baseAddress, unsigned long nBytes)
{
    unsigned long addressMask = (nBytes - 1);
    unsigned long offset;
    unsigned long testOffset;

    datum pattern     = (datum) 0xAAAAAAAA;
    datum antipattern = (datun) 0x55555555;

    /*
     * Write the default pattern at each of the power-of-two offsets..
     */
    for (offset = sizeof(datum); (offset & addressMask) != 0; offset <<= 1)
    {
        baseAddress[offset] = pattern;
    }

    /*
     * Check for address bits stuck high.
     */
    testOffset = 0;
    baseAddress[testOffset] = antipattern;

    for (offset = sizeof(datum); (offset & addressMask) != 0; offset <<= 1)
    {
        if (baseAddress[offset] != pattern)
        {
            return ((datum *) &baseAddress[offset]);
        }
    }

    baseAddress[testOffset] = pattern;

    /*
     * Check for address bits stuck low or shorted.
     */
    for (testOffset = sizeof(datum); (testOffset & addressMask) != 0;
         testOffset <<= 1)
    {
        baseAddress[testOffset] = antipattern;

        for (offset = sizeof(datum); (offset & addressMask) != 0;
             offset <<= 1)
        {
            if ((baseAddress[offset] != pattern) && (offset != testOffset))
            {
                return ((datum *) &baseAddress[testOffset]);
            }
        }

        baseAddress[testOffset] = pattern;
    }

    return (NULL);

}   /* memTestAddressBus() */
6.2.2.3 Device test

Once you know that the address and data bus wiring are correct, it is necessary to test the integrity of the memory device itself. The thing to test is that every bit in the device is capable of holding both and 1. This is a fairly straightforward test to implement, but it takes significantly longer to execute than the previous two.

For a complete device test, you must visit (write and verify) every memory location twice. You are free to choose any data value for the first pass, so long as you invert that value during the second. And because there is a possibility of missing memory chips, it is best to select a set of data that changes with (but is not equivalent to) the address. A simple example is an increment test.

The data values for the increment test are shown in the first two columns of Table 6-3. The third column shows the inverted data values used during the second pass of this test. The second pass represents a decrement test. There are many other possible choices of data, but the incrementing data pattern is adequate and easy to compute.

Table 6-3. Data Values for an Increment Test

Memory Offset

Binary Value

Inverted Value

000h

00000001

11111110

001h

00000010

11111101

002h

00000011

11111100

003h

00000100

11111011

...

...

...

0FEh

11111111

00000000

0FFh

00000000

11111111

The function memTestDevice implements just such a two-pass increment/decrement test. It accepts two parameters from the caller. The first parameter is the starting address, and the second is the number of bytes to be tested. These parameters give the user a maximum of control over which areas of memory will be overwritten. The function will return NULL on success. Otherwise, the first address that contains an incorrect data value is returned.

/**********************************************************************
 *
 * Function:    memTestDevice()
 *
 * Description: Test the integrity of a physical memory device by
 *              performing an increment/decrement test over the
 *              entire region.  In the process every storage bit
 *              in the device is tested as a zero and a one.  The
 *              base address and the size of the region are
 *              selected by the caller.
 *
 * Notes:
 *
 * Returns:     NULL if the test succeeds.  Also, in that case, the
 *              entire memory region will be filled with zeros.
 *
 *              A nonzero result is the first address at which an
 *              incorrect value was read back.  By examining the
 *              contents of memory, it may be possible to gather
 *              additional information about the problem.
 *
 **********************************************************************/
datum *
memTestDevice(volatile datum * baseAddress, unsigned long nBytes)
{
    unsigned long offset;
    unsigned long nWords = nBytes / sizeof(datum);

    datum pattern;
    datum antipattern;

    /*
     * Fill memory with a known pattern.
     */
    for (pattern = 1, offset = 0; offset < nWords; pattern++, offset++)
    {
        baseAddress[offset] = pattern;
    }

    /*
     * Check each location and invert it for the second pass.
     */
    for (pattern = 1, offset = 0; offset < nWords; pattern++, offset++)
    {
        if (baseAddress[offset] != pattern)
        {
            return ((datum *) &baseAddress[offset]);
        }

        antipattern = ~pattern;
        baseAddress[offset] = antipattern;
    }

    /*
     * Check each location for the inverted pattern and zero it.
     */
    for (pattern = 1, offset = 0; offset < nWords; pattern++, offset++)
    {
        antipattern = ~pattern;
        if (baseAddress[offset] != antipattern)
        {
            return ((datum *) &baseAddress[offset]);
        }

        baseAddress[offset] = 0;
    }

    return (NULL);

}   /* memTestDevice() */
6.2.2.4 Putting it all together

To make our discussion more concrete, let's consider a practical example. Suppose that we wanted to test the second 64-kilobyte chunk of the SRAM on the Arcom board. To do this, we would call each of the three test routines in turn. In each case, the first parameter would be the base address of the memory block. Looking at our memory map, we see that the physical address is 10000h, which is represented by the segment:offset pair 0x1000:0000. The width of the data bus is 8 bits (a feature of the 80188EB processor), and there are a total of 64 kilobytes to be tested (corresponding to the rightmost 16 bits of the address bus).

If any of the memory test routines returns a nonzero (or non-NULL) value, we'll immediately turn on the red LED to visually indicate the error. Otherwise, after all three tests have completed successfully, we will turn on the green LED. In the event of an error, the test routine that failed will return some information about the problem encountered. This information can be useful when communicating with a hardware engineer about the nature of the problem. However, it is visible only if we are running the test program in a debugger or emulator.

The best way to proceed is to assume the best, download the test program, and let it run to completion. Then, if and only if the red LED comes on, must you use the debugger to step through the program and examine the return codes and contents of the memory to see which test failed and why.

#include "led.h"

#define BASE_ADDRESS   (volatile datum *) 0x10000000
#define NUM_BYTES      0x10000

/**********************************************************************
 *
 * Function:    main()
 *
 * Description: Test the second 64 KB bank of SRAM.
 *
 * Notes:
 *
 * Returns:     0 on success.
 *              Otherwise -1 indicates failure.
 *
 **********************************************************************/
main(void)
{
    if ((memTestDataBus(BASE_ADDRESS) != 0) ||
        (memTestAddressBus(BASE_ADDRESS, NUM_BYTES) != NULL) ||
        (memTestDevice(BASE_ADDRESS, NUM_BYTES) != NULL))
    {
        toggleLed(LED_RED);
        return (-1);
    }
    else
    {
        toggleLed(LED_GREEN);
        return (0);
    }

}   /* main() */

Unfortunately, it is not always possible to write memory tests in a high-level language. For example, C and C++ both require the use of a stack. But a stack itself requires working memory. This might be reasonable in a system that has more than one memory device. For example, you might create a stack in an area of RAM that is already known to be working, while testing another memory device. In a common such situation, a small SRAM could be tested from assembly and the stack could be created there afterward. Then a larger block of DRAM could be tested using a nicer test algorithm, like the one shown earlier. If you cannot assume enough working RAM for the stack and data needs of the test program, then you will need to rewrite these memory test routines entirely in assembly language.

Another option is to run the memory test program from an emulator. In this case, you could choose to place the stack in an area of the emulator's own internal memory. By moving the emulator's internal memory around in the target memory map, you could systematically test each memory device on the target.

The need for memory testing is perhaps most apparent during product development, when the reliability of the hardware and its design are still unproved. However, memory is one of the most critical resources in any embedded system, so it might also be desirable to include a memory test in the final release of your software. In that case, the memory test and other hardware confidence tests should be run each time the system is powered-on or reset. Together, this initial test suite forms a set of hardware diagnostics. If one or more of the diagnostics fail, a repair technician can be called in to diagnose the problem and repair or replace the faulty hardware.

6.3 Validating Memory Contents

It does not usually make sense to perform the type of memory testing described earlier when dealing with ROM and hybrid memory devices. ROM devices cannot be written at all, and hybrid devices usually contain data or programs that cannot be overwritten. However, it should be clear that the same sorts of memory problems can occur with these devices. A chip might be missing or improperly inserted or physically or electrically damaged, or there could be an electrical wiring problem. Rather than just assuming that these nonvolatile memory devices are functioning properly, you would be better off having some way to confirm that the device is working and that the data it contains is valid. That's where checksums and cyclic redundancy codes come in.

6.3.1 Checksums

How can we tell if the data or program stored in a nonvolatile memory device is still valid? One of the easiest ways is to compute a checksum of the data when it is known to be good—prior to programming the ROM, for example. Then, each time you want to confirm the validity of the data, you need only recalculate the checksum and compare the result to the previously computed value. If the two checksums match, the data is assumed to be valid. By carefully selecting the checksum algorithm, we can increase the probability that specific types of errors will be detected.

The simplest checksum algorithm is to add up all the data bytes (or, if you prefer a 16-bit checksum, words), discarding carries along the way. A noteworthy weakness of this algorithm is that if all of the data (including the stored checksum) is accidentally overwritten with 0's, then this data corruption will be undetectable. The sum of a large block of zeros is also zero. The simplest way to overcome this weakness is to add a final step to the checksum algorithm: invert the result. That way, if the data and checksum are somehow overwritten with 0's, the test will fail because the proper checksum would be FFh.

Unfortunately, a simple sum-of-data checksum like this one cannot detect many of the most common data errors. Clearly if one bit of data is corrupted (switched from 1 to 0, or vice versa), the error would be detected. But what if two bits from the very same "column" happened to be corrupted alternately (the first switches from 1 to 0, the other from to 1)? The proper checksum does not change, and the error would not be detected. If bit errors can occur, you will probably want to use a better checksum algorithm. We'll see one of these in the next section.

After computing the expected checksum, we'll need a place to store it. One option is to compute the checksum ahead of time and define it as a constant in the routine that verifies the data. This method is attractive to the programmer but has several shortcomings. Foremost among them is the possibility that the data—and, as a result, the expected checksum—might change during the lifetime of the product. This is particularly likely if the data being tested is actually embedded software that will be periodically updated as bugs are fixed or new features added.

A better idea is to store the checksum at some fixed location in memory. For example, you might decide to use the very last location of the memory device being verified. This makes insertion of the checksum easy—just compute the checksum and insert it into the memory image prior to programming the memory device. When you recalculate the checksum, you simply skip over the location that contains the expected result, and compare the new result to the value stored there. Another good place to store the checksum is in another nonvolatile memory device. Both of these solutions work very well in practice.

6.3.2 Cyclic Redundancy Codes

A cyclic redundancy code (CRC) is a specific checksum algorithm that is designed to detect the most common data errors. The theory behind the CRC is quite mathematical and beyond the scope of this book. However, cyclic redundancy codes are frequently useful in embedded applications that require the storage or transmission of large blocks of data. What follows is a brief explanation of the CRC technique and some source code that shows how it can be done in C. Thankfully, you don't need to understand why CRCs detect data errors—or even how they are implemented—to take advantage of their ability to detect errors.

Here's a very brief explanation of the mathematics. When computing a CRC, you consider the set of data to be a very long string of 1's and 0's (called the message). This binary string is divided—in a rather peculiar way—by a smaller fixed binary string called the generator polynomial. The remainder of this binary long division is the CRC checksum. By carefully selecting the generator polynomial for certain desirable mathematical properties, you can use the resulting checksum to detect most (but never all) errors within the message. The strongest of these generator polynomials are able to detect all single and double bit errors, and all odd-length strings of consecutive error bits. In addition, greater than 99.99% of all burst errors—defined as a sequence of bits that has one error at each end—can be detected. Together, these types of errors account for a large percentage of the possible errors within any stored or transmitted binary message.

Those generator polynomials with the very best error-detection capabilities are frequently adopted as international standards. Three such standards are parameterized in Table 6-4. Associated with each standard are its width (in bits), the generator polynomial, a binary representation of the polynomial called the divisor, an initial value for the remainder, and a value to XOR (exclusive or) with the final remainder.[2]

Table 6-4. International Standard Generator Polynomials
 

CCITT

CRC16

CRC32

Checksum size (width)

16 bits

16 bits

32 bits

Generator polynomial

x16 + x12 + x5 + 1

x16 + x15 + x2 + 1

x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x1 + 1

Divisor (polynomial)

0x1021

0x8005

0x04C11DB7

Initial remainder

0xFFFF

0x0000

0xFFFFFFFF

Final XOR value

0x0000

0x0000

0xFFFFFFFF

The code that follows can be used to compute any CRC formula that has a similar set of parameters.[3]

To make this as easy as possible, I have defined all of the CRC parameters as constants. To change to the CRC16 standard, simply change the values of the three constants. For CRC32, change the three constants and redefine width as type unsigned long.

/*
 * The CRC parameters.  Currently configured for CCITT.
 * Simply modify these to switch to another CRC standard.
 */
#define POLYNOMIAL          0x1021
#define INITIAL_REMAINDER   0xFFFF
#define FINAL_XOR_VALUE     0x0000

/*
 * The width of the CRC calculation and result.
 * Modify the typedef for an 8 or 32-bit CRC standard.
 */
typedef unsigned short width;

#define WIDTH   (8 * sizeof(width))
#define TOPBIT  (1 << (WIDTH - 1))

The function crcInit should be called first. It implements the peculiar binary division required by the CRC algorithm. It will precompute the remainder for each of the 256 possible values of a byte of the message data. These intermediate results are stored in a global lookup table that can be used by the crcCompute function. By doing it this way, the CRC of a large message can be computed a byte at a time rather than bit by bit. This reduces the CRC calculation time significantly.

/*
 * An array containing the pre-computed intermediate result for each
 * possible byte of input.  This is used to speed up the computation.
 */
width  crcTable[256];

/**********************************************************************
 *
 * Function:    crcInit()
 *
 * Description: Initialize the CRC lookup table.  This table is used
 *              by crcCompute() to make CRC computation faster.
 *
 * Notes:       The mod-2 binary long division is implemented here.
 *
 * Returns:     None defined.
 *
 **********************************************************************/
void
crcInit(void)
{
    width  remainder;
    width  dividend;
    int    bit;


    /*
     * Perform binary long division, a bit at a time.
     */
    for (dividend = 0; dividend < 256; dividend++)
    {
        /*
         * Initialize the remainder.
         */
        remainder = dividend << (WIDTH - 8);

        /*
         * Shift and XOR with the polynomial.
         */
        for (bit = 0; bit < 8; bit++)
        {
            /*
             * Try to divide the current data bit.
             */
            if (remainder & TOPBIT)
            {
                remainder = (remainder << 1) ^ POLYNOMIAL;
            }
            else
            {
                remainder = remainder << 1;
            }
        }

        /*
         * Save the result in the table.
         */
        crcTable[dividend] = remainder;
    }

}   /* crcInit() */

Finally, we arrive at the actual workhorse routine, crcCompute. This is a routine that you can call over and over from your application to compute and verify CRC checksums. An additional benefit of splitting the computation between crcInit and crcCompute is that the crcInit function need not be executed on the embedded system. Instead, this function can be run in advance—on any computer—to produce the contents of the lookup table. The values in the table can then be stored in ROM (requiring only 256 bytes of storage) and referenced over and over by crcCompute.

/**********************************************************************
 *
 * Function:    crcCompute()
 *
 * Description: Compute the CRC checksum of a binary message block.
 *
 * Notes:       This function expects that crcInit() has been called
 *              first to initialize the CRC lookup table.
 *
 * Returns:     The CRC of the data.
 *
 **********************************************************************/
width
crcCompute(unsigned char * message, unsigned int nBytes)
{
    unsigned int   offset;
    unsigned char  byte;
    width          remainder = INITIAL_REMAINDER;


    /*
     * Divide the message by the polynomial, a byte at a time.
     */
    for (offset = 0; offset < nBytes; offset++)
    {
        byte = (remainder >> (WIDTH - 8)) ^ message[offset];
        remainder = crcTable[byte] ^ (remainder << 8);
    }

    /*
     * The final remainder is the CRC result.
     */
    return (remainder ^ FINAL_XOR_VALUE);


}   /* crcCompute() */

6.4 Working with Flash Memory

From the programmer's viewpoint, Flash is arguably the most complicated memory device ever invented. The hardware interface has improved somewhat since the original devices were introduced in 1988, but there is still a long way to go. Reading from Flash memory is fast and easy, as it should be. In fact, reading data from a Flash is not all that different from reading from any other memory device.[4] The processor simply provides the address, and the memory device returns the data stored at that location. Most Flash devices enter this type of "read" mode automatically whenever the system is reset; no special initialization sequence is required to enable reading.

Writing data to a Flash is much harder. Three factors make writes difficult. First, each memory location must be erased before it can be rewritten. If the old data is not erased, the result of the write operation will be some logical combination of the old and new values, and the stored value will usually be something other than what you intended.

The second thing that makes writes to a Flash difficult is that only one sector, or block, of the device can be erased at a time; it is impossible to erase a single byte. The size of an individual sector varies by device, but it is usually on the order of several thousand bytes. For example, the Flash device on the Arcom board—an AMD 29F010—has eight sectors, each containing 16 kilobytes.

Finally, the process of erasing the old data and writing the new varies from one manufacturer to another and is usually rather complicated. These device programming interfaces are so awkward that it is usually best to add a layer of software to make the Flash memory easier to use. If implemented, this hardware-specific layer of software is usually called the Flash driver.

6.4.1 Flash Drivers

Because it can be difficult to write data to the Flash device, it often makes sense to create a Flash driver. The purpose of the Flash driver is to hide the details of a specific chip from the rest of the software. This driver should present a simple application programming interface (API) consisting of the erase and write operations. Parts of the application software that need to modify data stored in Flash memory simply call the driver to handle the details. This allows the application programmer to make high-level requests like "erase the block at address D0000h" or "write a block of data, beginning at address D4000h." It also keeps the device-specific code separate, so it can be easily modified if another manufacturer's Flash device is later used.

A Flash driver for the AMD 29F010 device on the Arcom board is shown below. This driver contains just two functions: flashErase and flashWrite. These functions erase an entire sector and write an array of bytes, respectively. You should be able to see from the code listings that the interaction with the Flash device is no picnic. This code will work only with an AMD 29F010 device. However, the same API could be used with any Flash memory device.

#include "tgt188eb.h"

/*
 * Features of the AMD 29F010 Flash memory device.
 */
#define FLASH_SIZE              0x20000
#define FLASH_BLOCK_SIZE        0x04000

#define UNLOCK1_OFFSET          0x5555
#define UNLOCK2_OFFSET          0x2AAA
#define COMMAND_OFFSET          0x5555

#define FLASH_CMD_UNLOCK1       0xAA
#define FLASH_CMD_UNLOCK2       0x55
#define FLASH_CMD_READ_RESET    0xF0
#define FLASH_CMD_AUTOSELECT    0x90
#define FLASH_CMD_BYTE_PROGRAM  0xA0
#define FLASH_CMD_ERASE_SETUP   0x80
#define FLASH_CMD_CHIP_ERASE    0x10
#define FLASH_CMD_SECTOR_ERASE  0x30

#define DQ7    0x80
#define DQ5    0x20

/**********************************************************************
 *
 * Function:    flashWrite()
 *
 * Description: Write data to consecutive locations in the Flash.
 *
 * Notes:       This function is specific to the AMD 29F010 Flash
 *              memory.  In that device, a byte that has been
 *              previously written must be erased before it can be
 *              rewritten successfully.
 *
 * Returns:     The number of bytes successfully written.
 *
 **********************************************************************/
int
flashWrite(unsigned char *      baseAddress,
           const unsigned char  data[],
           unsigned int         nBytes)
{
    unsigned char * flashBase = FLASH_BASE;
    unsigned int    offset;

    for (offset = 0; offset < nBytes; offset++)
    {
        /*
         * Issue the command sequence for byte program.
         */
        flashBase[UNLOCK1_OFFSET] = FLASH_CMD_UNLOCK1;
        flashBase[UNLOCK2_OFFSET] = FLASH_CMD_UNLOCK2;
        flashBase[COMMAND_OFFSET] = FLASH_CMD_BYTE_PROGRAM;

        /*
         * Perform the actual write operation.
         */
        baseAddress[offset] = data[offset];

        /*
         * Wait for the operation to complete or time-out.
         */
        while (((baseAddress[offset] & DQ7) != (data[offset] & DQ7)) &&
               !(baseAddress[offset] & DQ5));

        if ((baseAddress[offset] & DQ7) != (data[offset] & DQ7))
        {
            break;
        }
    }

    return (offset);

}   /* flashWrite() */

/**********************************************************************
 *
 * Function:    flashErase()
 *
 * Description: Erase a block of the Flash memory device.
 *
 * Notes:       This function is specific to the AMD 29F010 Flash
 *              memory.  In this device, individual sectors may be
 *              hardware protected.  If this algorithm encounters
 *              a protected sector, the erase operation will fail
 *              without notice.
 *
 * Returns:     O on success.
 *              Otherwise -1 indicates failure.
 *
 **********************************************************************/
int
flashErase(unsigned char * sectorAddress)
{
    unsigned char * flashBase = FLASH_BASE;

    /*
     * Issue the command sequence for sector erase.
     */
    flashBase[UNLOCK1_OFFSET] = FLASH_CMD_UNLOCK1;
    flashBase[UNLOCK2_OFFSET] = FLASH_CMD_UNLOCK2;
    flashBase[COMMAND_OFFSET] = FLASH_CMD_ERASE_SETUP;
    flashBase[UNLOCK1_OFFSET] = FLASH_CMD_UNLOCK1;
    flashBase[UNLOCK2_OFFSET] = FLASH_CMD_UNLOCK2;

    *sectorAddress = FLASH_CMD_SECTOR_ERASE;

    /*
     * Wait for the operation to complete or time-out.
     */
    while (!(*sectorAddress & DQ7) && !(*sectorAddress & DQ5));

    if (!(*sectorAddress & DQ7))
    {
        return (-1);
    }

    return (0);

}   /* flashErase() */ 

Of course, this is just one possible way to interface to a Flash memory—and not a particularly advanced one at that. In particular, this implementation does not handle any of the chip's possible errors. What if the erase operation never completes? The function flashErase will just keep spinning its wheels, waiting for that to occur. A more robust implementation would use a software time-out as a backup. For example, if the Flash device doesn't respond within twice the maximum expected time (as stated in the databook), the routine could stop polling and indicate the error to the caller (or user) in some way.

Another thing that people sometimes do with Flash memory is to implement a small filesystem. Because the Flash memory provides nonvolatile storage that is also rewriteable, it can be thought of as similar to any other secondary storage system, such as a hard drive. In the filesystem case, the functions provided by the driver would be more file-oriented. Standard filesystem functions like open, close, read, and write provide a good starting point for the driver's programming interface. The underlying filesystem structure can be as simple or complex as your system requires. However, a well-understood format like the File Allocation Table (FAT) structure used by DOS is good enough for most embedded projects.

[1]  128 kilobytes is one-eighth of the total 1-megabyte address space.

[2]  The divisor is simply a binary representation of the coefficients of the generator polynomial—each of which is either or 1. To make this even more confusing, the highest-order coefficient of the generator polynomial (always a 1) is left out of the binary representation. For example, the polynomial in the first standard, CCITT, has four nonzero coefficients. But the corresponding binary representation has only three 1's in it (bits 12, 5, and 0).

[3]   There is one other potential twist called "reflection" that my code does not support. You probably won't need that anyway.

[4]  There is one small difference worth noting here. The erase and write cycles take longer than the read cycle. So if a read is attempted in the middle of one of those operations, the result will be either delayed or incorrect, depending on the device.